Evaluation of Language Identification Methods

نویسندگان

  • Dale Gerdemann
  • Simon Kranig
چکیده

Language identification plays a major role in several Natural Language Processing applications. It is mostly used as an important preprocessing step. Various approaches have been made to master the task. Recognition rates have tremendously increased. Today identification rates of up to 99 % can be reached even for small input. The following paper will give an overview about the approaches, explain how they work, and comment on their accuracy. In the remainder of the paper, three freely available language identification programs are tested and evaluated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Cultural Adaptation of Sniffin’ Sticks Smell Identification Test: The Malaysian Version

Introduction: Sniffin’ Sticks smell identification test is a tool used for evaluation of olfactory function but the results are culture-dependent. It relies on the subject’s familiarity to the odorant and descriptors. This study aims to develop the Malaysian version of Sniffin’ Sticks smell identification test suitable for local population usage. Materials and Methods:   The o...

متن کامل

Design, Implementation and Evaluation of Software to Increase Users’ Awareness and Facilitate the Identification of the Most Appropriate Centers Providing Laboratory Services in Tehran Province

Background and Aim: Medical diagnostic laboratories are among the most important centers in the treatment cycle of patients. Today, the conscious choice of such laboratories is one of the challenges that patients face in the treatment process. This study was conducted with the aim of improving the knowledge of software users in the field of laboratory sciences and also facilitating the consciou...

متن کامل

Evaluation of language identification methods using 285 languages

Language identification is the task of giving a language label to a text. It is an important preprocessing step in many automatic systems operating with written text. In this paper, we present the evaluation of seven language identification methods that was done in tests between 285 languages with an out-of-domain test set. The evaluated methods are, furthermore, described using unified notatio...

متن کامل

Evaluation and Statistical Validation of Black-Spots Identification Methods

Despite the identification of crash hotspots as a first step of the roads safety management process, with various effective black spots identification (HSID) methods, only a few researchers have compared the performance of these methods; also it is not clear which test is the most consistent in the black-spots identification. In this research, seven commonly applied HSID methods (accident frequ...

متن کامل

Comparing Natural Language Identification Methods based on Markov Processes?

We discover and experiment with categorization-based methods to natural language identification. Two approaches to language identification based on Markov processes are compared, both methods treat the incoming text on the character level. We performed series of experiments with the aim to make certain of high precision in language identification task of selected methods and also with the objec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005